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Energy efficiency is crucial for radio frequency identification (RFID) 
systems as the readers are often battery operated. The main source of the 
energy wastage is the collision which happens when tags access the 
communication medium at the same time. Thus, an efficient anti-collision 
protocol could minimize the energy wastage and prolong the lifetime of the 
RFID systems. In this regard, EPCGlobal-Classl-Generation2 (EPC-C1G2) 
protocol is currently being used in the commercial RFID readers to provide 
fast tag identification through efficient collision arbitration using the Q 
algorithm. However, this protocol requires a lot of control message 
overheads for its operation. Thus, a reinforcement learning based 
anti-collision protocol (RF-DFSA) is proposed to provide better time system 
efficiency while being energy efficient through the minimization of control 
message overheads. The proposed RF-DFSA was evaluated through 
extensive simulations and compared with the variants of EPC-Class 1 
Generation 2 algorithms that are currently being used in the commercial 
readers. The results show conclusively that the proposed RF-DFSA performs 
identically to the very efficient EPC-C1G2 protocol in terms of time system 
efficiency but readily outperforms the compared protocol in the number of 
control message overhead required for the operation. 
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1. INTRODUCTION 

Radio frequency identification (RFID) is a technology that uses radio waves for the purpose of 
identifying a large number of goods in a swift manner. This technology has widespread acceptance in various 
fields of applications such as inventory management, logistic, retailing and dairy farms [1]. A typical RFID 
setup has at least one reader with numerous RFID tags. RFID tags are classified into passive or active tags 
depending on the availability of a power source on them. A passive tag gets its power from the RF signal of 
the reader while an active tag has its own battery. Thus, the communication range of an active tag is much 
longer as compared to the passive tags. Despite having clear disadvantage in communication distance, 
passive tags are largely favored due to the advantages of low deployment cost and longer lifetime. Hence, we 
consider only an RFID system with a reader and numerous passive tags in this paper. Also, reader-to-reader 
interference is out of the scope of this paper. 

RFID tags communicate with the reader using a shared communication channel. Thus, collisions are 
inevitable when multiple tags try to communicate their ID at the same time to the reader [2]. This problem is 
particularly challenging since medium access schemes like frequency division multiple access (FDMA) or 
code division multiple access (CDMA) are cannot be implemented due to the computational limitations of 
the passive tags. Therefore, the burden of alleviating collisions in the network falls on the reader. There are 
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numerous collision arbitration protocols available in the literature for the RFID systems. Among those 
protocols, Framed Slotted Aloha (FSA)-based protocols are popular and are being used extensively in RFID 
standards due to the simplicity and good performance [3, 4]. FSA protocol is probabilistic in nature in which 
tags select a random slot to send their ID. Besides, in FSA, tags won’t check the channel to see whether it is 
busy or free before start transmitting their own ID. The usage of random transmission strategy with time slots 
reduces the collision probability in this protocol. Current RFID standard, EPCGlobal Class 1 Generation 2 
(EPC-C1G2) uses a variant of FSA for its operation [3]. 

The EPC-C1G2 protocol is used extensively in current generation commercial RFID devices since it 
is the defacto standard of the industry. Simplicity and high throughput of the algorithm helped in its rapid 
adoption in the industry. However, EPC-C1G2 generates a high amount of control messages which increases 
the energy consumption [5]. Despite the shortcoming, EPC-C1G2 is still favored as an efficient anti-collision 
algorithm (ACA) thanks to its high time system efficiency (TSE) as shown in Figure 1. Therefore, in this 
paper, we present an efficient ACA namely, reinforcement learning based dynamic frame slotted Aloha 
(RF-DFSA) which can perform as good as EPC-C1G2 protocol without the need for large number of control 
message exchanges. The performance of the proposed algorithm (RF-DFSA) was evaluated using 
Monte-Carlo simulations (5000 iterations) and compared with algorithms that are currently being used in 
commercial settings. RF-DFSA is very efficient in identifying tags as exhibited by its high TSE metric and 
requires one order of magnitude lesser message exchanges than that of the best performing algorithm in the 
commercial readers. 

The remainder of this paper is organized as follows. Section 2 discusses the current RFID standard 
and related works. In Section 3, the complete methodology of the proposed RFID anti-collision protocol is 
presented in detail. Section 4 presents results and discussion of the proposed protocol in relation to selected 
protocols from the literature. Finally, the paper concludes with concluding remarks and future works in 
Section 5. 


2. BACKGROUND INFORMATION AND RELATED WORKS 
2.1. Primer on FSA and Q-Algorithm 

In FSA, a frame is divided into slots of same length [6]. At the beginning of each frame, interrogator 
or reader broadcasts the frame size to the tags. The tags then select a slot randomly and send the ID 
information to the reader in that slot. Due to this random slot selection policy, excessive collisions are bound 
to happen depending on the tag population if a non-optimal frame size is selected by the reader. The average 
throughput, U of FSA for N tag population and frame size L is: 

! N-l 

U=N(1- e ) (1) 

and the normalized throughput, U norm is given by: 

N 1 N' 1 

U„ 0 rm= r (l- E ) (2) 

The normalized throughput is maximized when L = N. However, readers are not privy of the tag 
population and FSA has a fixed frame size. Due to these limitations, a variant of FSA called dynamic frame 
slotted aloha (DFSA) which adapts frames dynamically based on the backlog tag estimation was proposed in 
the literature [7]. However, its throughput drops significantly as the number of tags increases. Thus, 
Q-algorithm, a variant of DFSA was employed instead in the current generation RFID standard. 

Q algorithm operates using two parameters, namely, a floating-point parameter, Qf p and c q . The 
round Qf p value is used to set the frame size, L and the c q is used to increase or decrease the Qf p value in the 
event of collision or empty slots, respectively. An interrogation process is initiated by the reader with th 
broadcast of a Query command which contains the frame size. Upon receiving this command, tags generate 
a random number in the range of 0 — 2 Q_1 and set their counter equal to the generated value. Then, the 
reader interrogates each slot of the frame one by one using the Queryjrepeat command. For each 
Query_repeat command, tags decrease their counter by one. Tag with counter equals to zero transmits its 
ID information to the reader. However, if there are more than one tag with counter equals zero for current 
slot, a collision would be detected by the reader. Consequently, the Qf p is increased by some pre-determined 
^value. In the case of empty slot, Qf p would be decreased by the same c q value. The round Qf p value would 
be updated continuously for each slot until a change is detected upon which the reader would exit the current 
frame and broadcasts new frame size using Queryadjust command. This process repeats until all tags are 
identified. The standard limits the round Qf p value in the range between 0 to 15 for delay concerns. Besides, 
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reader has the autonomy to decide whether to exit the current frame or continue interrogating it even when 
the round Qf p value had changed. 

One unique feature of the EPC-C1G2 algorithm is that it has different time durations for success, 
collision and empty slots as per the standard. Thus, the claim that the throughput of FSA maximized when 
L = N is not applicable even though EPC-C1G2 is a variant of FSA. This has been verified analytically by 
[8] and the optimal frame size, L for EPC-C1G2 was calculated as: 


L=1.46xN-l 


(3) 


where, N — 1 is the contending tag population. 

Q-algorithm has several drawbacks as follow. The initial selection of the Q value affects its 
performance significantly. But the reader has no means to know the population of tags in the network a priori 
to set the Q value appropriately. Besides, Q adjustment strategy using c q produces excessive protocol 
overheads and also performs poorly in dense tag environment. 

2.2. Related works 

The limitations of the FSA propelled numerous research efforts which resulted in dynamic frame 
length Aloha, also known as dynamic frame slotted Aloha (DFSA) [7]. In DFSA, frame size is adjusted at 
each communication interval based on the estimated number of backlog tags given by: 

L=2.39xC (4) 

where, C is the number of collided slots in the previous frame. DFSA achieves normalized throughput of 
0.426 as compared to 0.368 in slotted Aloha. Hence, it is clear that the performance of FSA can be further 
improved by dynamically adapting the frame size which in turn depends on the prediction accuracy of the 
number of unidentified tags. 

Vogt developed unidentified or backlog tag estimation based on number of empty (E), collision (C) 
and success (S) slots in the previous frame [9]. The author selected the N which minimize the distance of 
two vectors, 


s vd =min 



(5) 


The expected values of empty, success and collision slots are: 


1 w 

a 0 =L(l- E ) 

(6) 

i N-l 


ai=N(l- E ) 

(7) 

a m = l~ a 0 _a l 

(8) 


Nevertheless, it is intuitively clear that an ACA should over allocate slots at the beginning of the 
interrogation round and reduce it gradually over the time to reduce the delay. However, existing algorithms in 
the literature use a static policy in which a static constant is used such as in (4) to set the frame size 
regardless of the fact that number of unidentified tags reduces over the time. Thus, in this work, we used 
reinforcement learning to address both the overallocation and static contant issues by learning an 
optimal policy. 


3. METHODOLOGY 

In this section, we present a novel and efficient frame adaptation method called reinforcement 
learning based dynamic frame slotted Aloha (RL-DFSA). We used the Q-learning algorithm since it is known 
to be one of the most effective and popular algorithms to find an optimal policy in the absence of transition 
probability and reward function [10]. 

Q-learning is a model-free reinforcement learning algorithm which learns by interacting with the 
environment and receiving Q-value for the state-action pair. The Q-value denotes the preference of taking an 
action over all other available actions when the system is at a certain state. Formally, for each state s t E S 
and action a t E A we define Q-value by: 
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Q(s t ,a t )<—Q(s t a t )+a[r t+1 +ymax a * ’ t+ ] (9) 

-Q(s t ,at) 

where, a is the learning rate, y is the discount factor and r t+1 is the delayed reward. 

We now describe how Q-learning can be used for adapting framed slotted Aloha protocol 
dynamically in RFID systems. We are well aware of the computational and space constraints of a typical 
RFID reader. However, there are numerous very powerful current generation RFID readers such as 
GAORFID, RapidRadio etc. which have ARM processor and memory card supports [11, 12]. We envision 
our algorithm to run on these powerful readers to improve their tag reading efficiency. 

The proposed RF-DFSA has two phases, namely, learning (exploration + exploitation) and testing 
(exploitation only) due to the technical difficulty in running both the learning and testing, concurrently. For 
1000 number of tags, RF-DFSA requires 20,000 iterations (~12 minutes) to converge to an optimal policy. 
Therefore, learning and testing phases were conducted separately for time concerns. Readers are reminded 
that RF-DFSA can run online on current generation high-end RFID readers since the algorithm only needs to 
run as long as there are tags to be read unlike 5000 iterations for each tag population as in the current 
simulations. Moreover, the slow convergence of the algorithm is due to the stochastic nature of our 
application and Q-learning itself is slow as rightly observed by [13]. 

Frame Slotted Aloha is considered in this work and Q-learning is used to capture the learning 
experience. A frame comprises multiple slots in which the reader communicates with nearby tags. Q-learning 
is applied on the reader side as the tags are very primitive for the needed computation. A reader would 
initialize an interrogation round and continuously adapt the frame size until all the tags in the range are 
identified. However, the reader is not privy of the tag population as in the real-world application. Thus, the 
reader needs to estimate the contending tags population and adapt the frame size based on the available 
information such as collisions, empty and successful slots in the previous frame. There are numerous tag 
estimation and frame size adaptation methods available in the literature as reviewed in Section 2. In this 
work, Q-learning was used to solve both the tag estimation and frame size adaptation problems of 
RFID system. 

It is evident from Section 2 that the frame size should be 1.46 times larger than the contending tags 
population. Thus, the focus shifted from frame size estimation to number of tags estimation since the 
efficiency of the algorithm now rests solely on the accuracy of the tag estimation algorithm. The number of 
tags transmitting at the same slot can vary according to the availability of number of slots and tag population. 
However, the lower bound for the number of tags in a collision slot is two. In this work, we defined the 
actions for Q-learning based on this rational intuition. Initially, eleven actions were defined as follow: 

Action 1=1.46 x2.0 x number_of_collision (10) 

Action 2=1.46 x2.2 x number_of_collision (11) 

and so on until, 

Action 11=1.46 x4.0 xnumber_of_collision (12) 


The number of actions was limited at eleven since increasing it indefinitely would increase the 
computational and space complexities, exponentially. The state of the learning agent is defined as the number 
of collisions in the previous frame. The goal of the learning agent is to reduce the number of collisions to 
zero. The agent is assisted in its task through a reward function. Reward function for this work is defined 

using the collision ratio (--) as follows: 

frame size 


r 


reward= < 


v 


-1 ,ratio>0 and< ^ 
-2, ratio> - and< - 

’ ~4 2 

1 3 

-4, ratio>- and< - 

~2 4 

3 

-8, ratio>- 


(13) 


An initial study was conducted to identify dominant actions based on the cumulative sum of the Q 
values. Dominant actions were characterized by the relatively high cumulative sum of the Q-values. We 
ranked the cumulative sum from highest to lowest and top 3 actions were identified as dominant. Only three 
actions (4, 2 and 1) were selected since the difference in cumulative sum of the Q-values between the third 
and following actions were insignificant. Using the optimal actions identified in the initial study, another 
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simulation was performed to acquire the optimal policy and Q-matrix for a tag population of 1000. The 
parameters of RL-DFSA algorithm for the initial study are presented in Table 1. The initial state of the agent 
can be any arbitrary value except one since state one is the goal state. The timing parameters given in Table 2 
were used for all our simulations. The pseudocode of RL-DFSA is presented in Algorithm 1. In the testing 
phase, the learned Q-matrix was used to select an optimal action during each state. 


Table 1. RL-DFSA parameter for the initial study 


Parameter Value 


Initial state 

2 

Action 

11 

Learning rate, a 

0.1 

Discount rate,y 

0.9 

Exploration^ 

0.3 

Epsilon decay rate 

0.99971 

Maximum iteration 

30,000 

Number of tags 

1000 

Initial frame size 

16 


Table 2. EPC-Gen2 reader interrogation parameters [14] 

Parameter 

Duration 

Parameter 

Duration 

Tari 

6.5ps 

Tari 

6.5ps 

RTcal 

16.25ps 

RTcal 

16.25ps 

BLF 

394kHz 

BLF 

394kHz 

T, 

20.84ps 

T, 

20.84ps 

t 2 

7.61ps 

t 2 

7.61ps 

TRext 

1 

TRext 

1 

M 

1 

M 

1 

Trn16 

126.9ps 

Trn16 

126.9ps 

Tepc 

695.43ps 

Tepc 

695.43ps 

t 3 

25.381ps 




Algorithm 1 RL-DFSA 

1. Set t=0, max_iteration=30000 and initialize 
a, y, e and Q-values Q ( s t , a t ) for all s t E S 
and a t E A. 

2. while t<max_iteration do 

3. s t = 2 and select random action, a t 

4. Frame size=16 

5. Collision=0; Success=0; Empty=0; 

6. while s t =£ 1 do 

7. Broadcast frame size and get C, S, E. 

8. Next state s t+1 =C + 1; 

9. Get reward r t = r(s t ,a t ). 

10. Update Q-value 

Q(s t ,a t )^Q(s ti a t )+a[r t 4 T max a Q(s t+1 ,a')-Q(s t , 

11. if s > random_float_between 0 and 1 

12. Select random action, a t 

13. else 

14. Select next action a t = arg max a eA Q(s t ,a ) 

15. end 

16. Frame size=a t 

17. s t = Next state s t+1 

18. end while 

19. t = t + 1 and £=e*decay_rate 

20 . end while 


4. SIMULATION RESULTS AND DISCUSSION 

The collision arbitration performance of the proposed algorithm and the reference algorithms 
including EPC-Q-Slot, EPC-Q-Frame [15], EPC-Fixed [15] and Ideal was compared through extensive 
Monte Carlo simulations under Matlab software environment. All the EPC algorithms are variants of 
EPC-C1G2 algorithm which was discussed in Section 2. EPC-Q-Slot algorithm simulates the scenario where 
the reader immediately issues the Query_adjust command as soon as a variation in round (Q fp ) value is 
detected. In contrast, EPC-Q-Frame simulates the scenario when the reader only issues a Queryadjust 
command at the beginning of a new frame even though the round (Q fp ) value had changed midframe. 
Besides, EPC-Fixed algorithm simulates the commercial readers with fixed frame size. Finally, the Ideal 
algorithm was used as an upper bound of performance which can be achieved by an algorithm which knows 
the tag population a priori. Two metrics, namely, the time system efficiency and the average number of 
frames are considered to evaluate the performance of RL-DFSA. 

a. Time system efficiency (TSE) [8]-> This metric gives the percentage of time successfully spend in 
identifying tags. It is calculated as follow: 


TSE= 


Success><T s 

Success x T s +Empty x T e +Collision x T c 


(14) 


where, Success, Collision, and Empty denote the number of successful, collided and empty slots in the frame, 
respectively. T s , T e and T c are the duration of successful, empty and collision slots, respectively, 
b. Average number of frames per round -> This metric gives us an average number of frames required by 
the reader for each interrogation round. 

The primary time parameters for the simulations were obtained from [14] and are presented in 
Table 2. Besides, the simulation scenario was divided into two-sparse (10-100 tags) and dense (100-1000 
tags) environments-for an easier interpretation of the results. The initial simulation parameters of all the 
algorithms are given in Table 3. The performance of RL-DFSA in terms of TSE was evaluated by comparing 
it with the other four algorithms for a various number of tags as shown in Figure 1. 
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Table 3. Initial parameters for the algorithms 


Algorithm 

Parameter 

Value 

EPC-Q-Slot 

Q 

C q 

4 

0.3 

EPC-Q-Frame 

Q 

c q 

4 

0.3 


Q (Sparse) 

4 

EPC-Fixed 

Q (Dense) 

7 


c q 

0.3 

Ideal 

Initial Frame Size 

16 

Subsequent Frame 

1.46 x no. of remaining tags 

RL-DFSA 

Initial Frame Size 
Subsequent Frame 

16 

Based on the learned policy 



Figure 1. Time system efficiency of the algorithms for a different number of tags 


The Ideal algorithm serves as an upper bound of performance that can be achieved when the reader 
knows exactly the number of tags to be identified and set the frame-size to 1.46 times the contending tag 
population. In both the sparse and dense tag environments, EPC-Q-Slot performs identically to RL-DFSA as 
we can see in Figure 1. Both these algorithms achieve near-optimal performances thanks to their efficient 
frame size adaptation mechanism. In the case of EPC-Q-Slot, the ability to adjust the frame size as soon as 
the round Q fp changes provide it with the needed dynamic adaptation for any number of tag population. On 
the other hand, RF-DFSA selects an optimal frame size for each iteration based on the received positive or 
negative feedback for its decision on the previous frame. This ability to adjust its action based on rewards 
and punishment makes it suitable for dynamic environments such as the RFID systems. In contrast, both the 
EPC-Q-Frame and EPC-Fixed algorithms are unable to cope up with the dynamic tag population as can be 
seen from their unstable and inferior TSE performance. Besides, EPC-Fixed performed the worst since 
dynamic frame size adaptation is absent when the number of tags changes. Overall, RF-DFSA is 6.3%-250%, 
0.4%-18.6% and 0.4%-5.7% better at TSE for sparse tag environment as compared to EPC-Fixed, 
EPC-Q-Frame and EPC-Q-Slot algorithms, respectively. Also, for dense tag environment, RF-DFSA 
performs 5.3%-707.4% and 17%-578.8% better as compared to EPC-Fixed and EPC-Q-Frame algorithms, 
respectively. The performance of RF-DFSA and EPC-Q-Slot algorithms is identical in both the sparse and 
dense tag environments as far as the TSE metric is concerned: 

It is evident from the literature that the radio transmitter module of sensor nodes consumes almost 
three times more energy as compared to the node’s microcontroller [16]. So, it is perfectly logical to opt for 
longer computation if it can reduce the required number of radio transmissions. In our case, RF-DFSA 
requires a much larger number of computations as compared to the other four algorithms since it needs to 
update the state-reward matrix using (9) after each transmission. However, the tradeoff between the energy 
required for communication and computation becomes irrelevant if the performance of the proposed 
algorithm is subpar. Incidentally, RF-DFSA performs identically to EPC-Q-Slot which is the best performing 
commercial algorithm while requiring an order of magnitude lower number of frames as presented in Figure 
2. The difference in the number of frames is much more pronounced in the dense tag environment. Through 
reducing the number of Query_adjust commands by selecting optimal frame in each round, RF-DFSA 
improves energy efficiency. Meanwhile, EPC-Fixed performed worse as compared to EPC-Q-Slot in sparse 
tag environment since the frame size of 16 is not enough to accommodate a larger number of tags. However, 
its performance increases in dense tag environments since the new frame size is 128. Generally, algorithms 
which adapt their frame size following a frame-by-frame adjustment strategy require a smaller number of 
control messages as compared to the algorithms with slot-by-slot adjustment techniques. The reason for using 


Energy efficient anti-collision algorithm for the RFID networks (Murukesan Loganathan) 
















628 n 


ISSN: 2302-9285 


slot-by-slot adjustment strategy in commercial readers is mainly due to its high TSE performance which is 
hardly achievable by normal frame-by-frame algorithms as shown in Figure 1. However, RL-DFSA can 
guarantee a similar TSE performance that of the EPC-Q-Slot without the need for high communication cost 
due to the issuance of numerous Query adjust commands. This high efficiency of the proposed algorithm is 
due to its ability to learn an optimal policy even in a dynamic environment such as the RFID systems. 


Number of frames per round 

140 



10 20 40 80 100 

Number of tags 

■ EPC-Fixed ■ EPC-Q-Frame ■ EPC-QrSlot ■ RL-DFSA ■ Ideal 


Number of frames per round 

1000 



100 200 400 800 1000 

Number of tags 

■ EPC-Fixed ■ EPC-QrFrame ■ EPC-Q-Slot ■ RL-DFSA ■ Ideal 


Figure 2. Average number of frames required per interrogation round for various algorithms 


5. CONCLUSION AND FUTURE WORKS 

Collision arbitration in RFID systems is crucial since collsion wastes precious energy of the 
battery-operated reader. This is not trivial since the tag population may vary from tens to thousands. Thus, in 
this work, we proposed RL-DFSA algorithm to address the collision and energy efficiency problems of RFID 
systems. RL-DFSA uses the Q-learning algorithm to select an optimal action which corresponds to a frame 
size among the available actions on each iteration based on the reward or punishment it got for selecting the 
previous action that leads it to be in its current state. The proposed algorithm was compared with EPC-Fixed, 
EPC-Q-Frame, EPC-Q-Slot, and Ideal algorithms in sparse and dense tag environments. The TSE metric of 
both the RL-DFSA and EPC-Q-Slot are almost identical while being very close to the performance of the 
Ideal algorithm. However, RL-DFSA achieved this near-optimal performance without the need for 
propagating excessive control messages as is the case with the EPC-Q-Slot algorithm Figure 2. Thus, 
RL-DFSA is energy efficient as compared to the EPC-Q-Slot since it is a known fact that the communication 
cost of a system is very high as compared to its computational cost. 
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